Agent Tools Developer Guide

This guide covers how to add a new agent tool, how artifacts are laid out on disk, and how caching works for agent runs.

Operational Model

Annolid agent operations are split into two layers:

Self-improving: skills and memory evolve behavior without replacing installed code.
Self-updating: signed update workflow stages and applies software updates with rollback plans.

Self-improving

Skills: loaded with precedence workspace -> managed (~/.annolid/skills) -> bundled.
Hot reload: controlled by skills.load.watch and skills.load.pollSeconds.
Skill manifest validation: frontmatter is validated at load time; invalid manifests are marked unavailable.
Workspace memory: daily notes in memory/YYYY-MM-DD.md and curated long-term notes in memory/MEMORY.md.
Pre-compaction flush: transcript snapshot can be appended before compaction via memory flush helpers.
Memory retrieval plugin: default is local semantic ranking with keyword fallback (workspace_semantic_keyword_v1).

Self-updating

Channel-aware update manager supports stable, beta, and dev.
Pipeline: preflight -> stage -> verify -> apply -> restart marker -> post-check.
Rollback: rollback plan is generated for each run and executed on apply/post-check failures.
Canary policy: rollout can enforce rollback thresholds using sample count, failure-rate, and regression limits.
Safe update service: supports manifest check, artifact staging/download, checksum verification, signature verification, and transaction reporting.
Auto-update: disabled by default; configurable interval+jitter schedule when enabled (ANNOLID_AUTO_UPDATE_* env settings).
GUI controls: AI Model Settings -> Agent Runtime includes auto-update enable/channel/check-now/rollback and bot settings for skill hot reload, memory mode, and skill source locations.
Production safety policy: in production mode (ANNOLID_PRODUCTION_MODE=1 or ANNOLID_ENV=production), signed update manifests and signed non-builtin skills are required.

How to add a tool

Define the tool by extending the base class in annolid/core/agent/tools/base.py:
- Implement run(self, ctx, payload) with your core logic.
- Use ctx.results_dir and ctx.run_id to derive stable outputs.
- Use ctx.artifact_store if you want to persist artifacts and participate in caching.
Register the tool in the registry:
- Add a new tool wrapper in annolid/core/agent/tools/.
- Export it from annolid/core/agent/tools/__init__.py.
- Register it with ToolRegistry (see annolid/core/agent/tools/registry.py).
Integrate with the runner (Phase 4+):
- Compose tools using the registry and a pipeline definition.
- Ensure inputs/outputs follow the unified data models in base.py.
Write a minimal test:
- Use tiny inputs and validate outputs.
- Prefer tests under tests/ that don’t require large external models.

Artifact layout

Artifacts are stored per video results directory and organized as:

<results_dir>/
- agent.ndjson (default agent output)
- <video_name>_000000000.json + per-frame LabelMe JSON
- .agent_runs/<run_id>/ (run-scoped artifacts)
- .cache/agent_cache.json (cache metadata for re-run reuse)

The FileArtifactStore resolves paths relative to:

Run artifacts: .agent_runs/<run_id>/...
Cache artifacts: .cache/...

See annolid/core/agent/tools/artifacts.py for helpers.

Caching semantics

Agent runs compute a content hash from:

video path + filesystem stats (size/mtime),
behavior spec (full schema),
run config (stride, max frames, etc.),
model identifiers,
output NDJSON name.

If the cache hash matches and both the NDJSON and annotation store exist, the service returns cached results without re-running the agent.

To disable reuse from the CLI, run:

annolid-run agent --no-cache ...

Citation management tools

Annolid includes built-in BibTeX tooling for paper citation workflows:

CLI:
- annolid-run citations-list --bib-file refs.bib [--query ...]
- annolid-run citations-upsert --bib-file refs.bib --key mykey --title ... --author ... --year ...
- annolid-run citations-remove --bib-file refs.bib --key mykey
- annolid-run citations-format --bib-file refs.bib
Agent function tools:
- bibtex_list_entries
- bibtex_upsert_entry
- bibtex_remove_entry
- gui_save_citation (save from active PDF/web viewer context)

Examples in Annolid Bot message input:

save citation
list citations
list citations from references.bib for annolid
save citation from pdf as annolid2024 to references.bib
save citation from web
add citation @article{yang2024annolid, title={Annolid: Annotate, Segment, and Track Anything You Need}, author={Yang, Chen and Cleland, Thomas A}, journal={arXiv preprint arXiv:2403.18690}, year={2024}}
save citation from web with strict validation
save citation from pdf without validation
open threejs example two mice
open threejs example brain
open threejs html /tmp/annolid_threejs_examples/two_mice.html
open threejs https://example.org/viewer.html

Default behavior:

save citation first attempts Google Scholar BibTeX lookup from the active paper context, then falls back to Crossref/OpenAlex when needed, and saves the merged entry to .bib.

GUI workflow:

In Annolid Bot input toolbar, click 📚 to open the citation manager.
Manage a .bib file, save citations from active PDF/web context, choose auto-validation or strict mode, view/edit a Source column (URL or PDF path), edit rows inline with year/DOI checks, and remove selected entries.

See also: docs/source/citations_tutorial.md for a full user tutorial.

Operator Commands

Use annolid-run commands for routine operations:

annolid-run agent skills refresh [--workspace <path>]
annolid-run agent skills inspect [--workspace <path>]
annolid-run agent memory flush [--workspace <path>] [--session-id <id>] [--note <text>]
annolid-run agent memory inspect [--workspace <path>]
annolid-run agent eval run --traces <jsonl> --candidate-responses <jsonl> --out <report.json>
annolid-run agent eval build-regression --workspace <path> --out <traces.jsonl> [--min-abs-rating 1]
annolid-run agent eval gate --changed-files <files.txt> --report <report.json> [--max-regressions 0] [--min-pass-rate 0.0]
annolid-run agent feedback add --workspace <path> --rating -1|0|1 [--trace-id <id>] [--comment <text>] [--expected-substring <text>]
annolid-run update check --channel stable|beta|dev [--require-signature]
annolid-run update run --channel stable|beta|dev [--execute] [--require-signature] [--skip-post-check] [--canary-metrics <json>]
annolid-run update rollback --install-mode package|source --previous-version <X.Y.Z> [--execute]

Admin Function APIs

The agent runtime also exposes operator-style function tools:

skills.refresh
memory.flush
eval.run
update.run
- update.run requires explicit operator consent phrase for execute=true: APPROVE_ANNOLID_CORE_UPDATE (override with ANNOLID_OPERATOR_UPDATE_CONSENT_PHRASE).

Shell Session Tools

For OpenClaw-style shell lifecycle workflows, Annolid now provides session tools:

exec_start(command, working_dir?, background?, timeout_s?, pty?)
exec_process(action, session_id?, wait_ms?, tail_lines?, text?, submit?)

Supported exec_process.action values:

list, poll, log, write, submit, kill

Notes:

pty is accepted but currently not enabled (pty_supported=false in responses).
Basic dangerous command patterns are blocked at start time.
Runtime policy group group:runtime now includes exec, exec_start, and exec_process.

Improvement Quality Loop

Anonymized run traces: workspace/eval/run_traces.ndjson captures hashed session/channel/chat IDs and redacted text previews.
Explicit user feedback: workspace/eval/feedback.ndjson stores rating/comment/optional expected substring for promotion signals.
Regression dataset build: combines traces + feedback into eval traces for CI and pre-promotion checks.
Shadow mode: enable ANNOLID_AGENT_SHADOW_MODE=1 to log alternative routing decisions to workspace/eval/shadow_routing.ndjson. use annolid-run agent skills shadow --candidate-pack <dir> to compare candidate skill packs before promotion.

Governance and Audit

Governance events are stored as NDJSON with default path:

~/.annolid/governance/events.ndjson

You can override it with:

ANNOLID_GOVERNANCE_EVENTS_PATH=/custom/path/events.ndjson

Audited event categories include skill snapshot/refresh changes, memory writes/flushes, update stage/run actions, and rollback outcomes.

Three.js bot tools

Annolid Bot supports direct Three.js viewer control in GUI sessions.

Function tools:
- gui_open_threejs(path_or_url)
- gui_open_threejs_example(example_id)
Built-in example IDs:
- two_mice_html (default)
- brain_viewer_html
- helix_points_csv
- wave_surface_obj
- sphere_points_ply

The bot recognizes natural-language commands such as open threejs example ....

Browser Automation Safety

Annolid supports MCP browser automation with both granular tools and a unified tool:

mcp_browser (single control surface with actions: status|start|stop|navigate|snapshot|screenshot|act|wait)
mcp_browser_navigate, mcp_browser_click, mcp_browser_type, etc.

Navigation hardening:

browser navigation allows http://, https://, and about:blank only.
unsafe schemes such as file://, javascript:, and data: are blocked.
GUI open_url also blocks file://; use an explicit local file path instead.

Annolid code/docs Q&A and tutorials

Annolid Bot is optimized to answer Annolid-specific questions from local docs and code context.

It can explain modules, workflows, and settings with file-path references.
It can generate on-demand tutorials for requested topics and levels using the active chat model, grounded by Annolid docs/code evidence.
When a tutorial is saved to Markdown, Annolid Bot auto-opens the generated .md in the embedded web viewer.
Direct command examples:
- create on demand tutorial for realtime camera setup in annolid
- create beginner tutorial for behavior analysis and save to markdown file
- how do i use annolid for behavior analysis

Realtime camera snapshot + email

Annolid Bot can capture a snapshot from a camera stream and send it by email.

Stream snapshot:
- GUI sessions: use gui_check_stream_source with save_snapshot=true.
- This GUI tool now runs a full camera mission pipeline:
  - probe -> capture -> annotate -> notify/email
  - returns explicit camera_mission.steps and delivery status objects.
- Non-GUI channels (for example email/IM): use camera_snapshot.
- Snapshot files are saved under .annolid/workspace/camera_snapshots/.
- Outlook Safe Links camera URLs are automatically unwrapped to the original stream URL.
- Source fallback policy is intent-aware:
  - eye-blink intent defaults to camera 0
  - network camera intent prefers remembered network streams.
Email with attachments:
- Use the email tool with:
  - to
  - subject
  - content
  - optional attachment_paths (list of local file paths)

Example bot intent:

check wireless camera, save a snapshot, and email it to user@example.com

Realtime email/report spam control:

Realtime bot report interval controls report cadence.
Email requests use an additional minimum interval (bot_email_min_interval_sec, default 60s) to avoid repeated email requests.

Security and policy hardening (Phase 2)

Adds stricter defaults for tool access and data handling:

Capability-oriented tool profiles:
- gui, email, realtime, filesystem
- explicit capability expressions are supported, for example:
  - capability:gui,email
  - capability:gui+realtime
Snapshot path hardening:
- camera_snapshot writes only under workspace camera_snapshots/.
- symlink escape paths are rejected.
Redaction-at-source:
- private/local stream endpoints are redacted in outbound content.
- sensitive metadata keys (for example peer_id, account_id) are redacted before publish.
Runtime high-risk guard:
- deny-by-default blocks risky multi-tool chains unless explicit intent is provided.
- config toggle: agents.defaults.strict_runtime_tool_guard (default true).

Example config:

{
  "agents": {
    "defaults": {
      "strict_runtime_tool_guard": true
    }
  }
}

Explicit high-risk intent markers supported by policy/runtime guards:

intent:high-risk
intent:high_risk
allow:high-risk
allow_high_risk
unsafe:high-risk

Session memory and replay

Annolid agent sessions now keep separated memory layers and replayable event logs.

Working memory:
- short-horizon session summary derived from recent user/assistant turns.
- stored in session metadata as working_memory.
- bounded by a character quota in PersistentSessionStore.
Long-term memory:
- stable facts/notes derived from session facts and consolidation updates.
- stored in session metadata as long_term_memory.
- bounded by a character quota in PersistentSessionStore.

Deterministic consolidation and telemetry

Memory consolidation now uses deterministic triggers based on:

session turn counter (turn_counter)
next scheduled consolidation turn (next_consolidation_turn)
history length relative to memory window

Telemetry is persisted in session metadata as memory_telemetry with entries like:

timestamp
outcome (for example llm_consolidated, skipped_short_transcript, not_due)
history_len, archive_len, keep_len
elapsed_ms

Memory mutation audit trail

Session metadata contains memory_audit_trail entries for memory changes, including:

timestamp
scope (facts, working_memory, long_term_memory)
mutation (for example set_fact, set_working_memory)
reason
turn_id
before_chars / after_chars

Safe replay for debugging

Session event records are stored in metadata key event_log.

Each entry includes:
- timestamp
- direction (inbound/outbound)
- kind (for example user, assistant, progress, final)
- optional turn_id, event_id, idempotency_key
- payload

GUI/backend helpers:

replay_session_debug_events(session_store=..., session_id=..., direction=\"\", limit=200)
format_replay_as_text(events)

These helpers are implemented in:

annolid/core/agent/gui_backend/session_io.py